# Fix: deterministic multithreaded lidar odometry#379
Merged
JanuszBedkowski merged 2 commits intoMapsHD:mainfrom Feb 28, 2026
Merged
# Fix: deterministic multithreaded lidar odometry#379JanuszBedkowski merged 2 commits intoMapsHD:mainfrom
JanuszBedkowski merged 2 commits intoMapsHD:mainfrom
Conversation
…er-pose accumulators, guaranteeing ST=MT bit-identical results regardless of thread count
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
optimize_lidar_odometrygives different results between multithreaded runs.The multithreaded path used
tbb::combinable<MatrixPair>+combine_eachtosum per-thread Hessian copies.
combine_eachiterates thread-local storage inunspecified order, and since floating-point addition is not associative
(
(a+b)+c != a+(b+c)), different summation orders produce different Hessianmatrices. Over many optimiser iterations these small differences accumulate and
lead to different convergence paths.
Why not per-point storage?
The straightforward fix is to store each point's 6×6 + 6×1 contribution
separately and sum them in fixed order. This was tried but makes the whole
optimization ~60% slower due to two effects:
be parallelized (order must be fixed for determinism)
instead of local accumulators hurts cache during the compute phase
Fix: fixed-chunk per-pose accumulators
Split points into 128 fixed-size chunks. Each chunk has its own per-pose 6×6 and
6×1 accumulator matrices (
chunk_AtPA[chunk][pose],chunk_AtPB[chunk][pose]).tbb::parallel_forover chunks, or sequentialfor):each chunk zeros its own accumulators, then iterates its point range and
accumulates into
chunk_AtPA[chunk][pose]/chunk_AtPB[chunk][pose]in fixed chunk×pose order
This is deterministic because:
process_chunklambda — only the loop differsBoth ST and MT paths iterate over the same 128 chunks using the same
process_chunklambda — only the loop type differs (tbb::parallel_forvsplain
for), guaranteeing bit-identical results.Performance: ~same as the original non-deterministic code.
NUM_CHUNKS = 128must be >= max number of CPU cores to ensure all cores getwork. 128 is sufficient for current hardware while keeping the chunk overhead
negligible.
Changes in
lidar_odometry_utils_optimizers.cpp#include <tbb/blocked_range.h>(no longer used)add_indoor_hessian_contribution/add_outdoor_hessian_contribution: writeto fixed-size 6×6 and 6×1 output refs instead of indexing into global Hessian
via
block<6,6>(offset, offset). Sign convention changed: helpers accumulatepositive
AtPB, reduce subtracts once (AtPBndt -= chunk_AtPB)compute_hessian: takesMat6x6& out_AtPAandVec6x1& out_AtPBinsteadof
Eigen::MatrixXd&+matrix_offsettbb::combinable<MatrixPair>with 128 fixed-chunk approach (staticstd::vector<std::vector<Mat6x6/Vec6x1>>to avoid reallocation across ~1354calls per dataset)
tbb::combinable<LookupStats>kept for integer lookup counters (accumulationorder doesn't matter for integers)
UTL_PROFILER_END(before_iter)— was insideprocess_worker_step_lidar_odometry_corebut itsBEGINwas in the caller;moved back to caller
UTL_PROFILER_SCOPEtoprocess_worker_step_1,process_worker_step_2,and
process_worker_step_lidar_odometry_coreNote: This was not caught in the previous nondeterminism PR because the
floating-point differences are very small and only showed up on certain datasets
during extended testing.
Testing
Tested on 7 datasets (4 MT + 4 ST runs each), lengths from 500m to 5000m.
All runs produce identical results between ST and MT and between runs.
Probably related to #338 — the non-deterministic results reported there could
be caused by this
tbb::combinablesummation order issue.